Volatile Decision Dynamics: 

Experiments, Stochastic Description, 
Intermittency Control, and Traffic Optimization 

Dirk Helbing, 1,2,3 Martin Schonhof, 1 and Daniel Kern 1 

1 Institute for Economics and Traffic, Dresden University of Technology, 
D-01062 Dresden, Germany, helbing@trafficforum.org, www.helbing.org 

2 Collegium Budapest — Institute for Advanced Study, 
Szentharomsag u. 2, H-1014 Budapest, Hungary 

3 CCM — Centro de Ciencias Matematicas, Universidade da Madeira, 
Campus Universitario da Penteada, Pt-9000-390 Funchal, Madeira, Portugal 

February 1, 2008 



The coordinated and efficient distribution of limited resources by indi- 
vidual decisions is a fundamental, unsolved problem. When individuals 
compete for road capacities, time, space, money, goods, etc., they nor- 
mally make decisions based on aggregate rather than complete informa- 
tion, such as TV news or stock market indices. In related experiments, we 
have observed a volatile decision dynamics and far-from-optimal payoff 
distributions. We have also identified ways of information presentation 
that can considerably improve the overall performance of the system. 
In order to determine optimal strategies of decision guidance by means 
of user-specific recommendations, a stochastic behavioural description 
is developed. These strategies manage to increase the adaptibility to 
changing conditions and to reduce the deviation from the time-dependent 
user equilibrium, thereby enhancing the average and individual payoffs. 
Hence, our guidance strategies can increase the performance of all users 
by reducing overreaction and stabilizing the decision dynamics. These re- 
sults are highly significant for predicting decision behaviour, for reaching 
optimal behavioural distributions by decision support systems, and for 
information service providers. One of the promising fields of application 
is traffic optimization. 
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1 Introduction 



Optimal route guidance strategies in overloaded traffic networks, for example, re- 
quire reliable traffic forecasts (see Fig. |l|). These are extremely challenging for two 
reasons: First of all, traffic dynamics is very complex. However, after more than 50 
years of traffic research, physicists have recently gained at least a semi-quantitative 
understanding of it based on the concept of self-driven, non-linearly interacting 
many-particle systems 0. The second and more serious problem is the invalidation 
of forecasts by the driver reactions to route choice recommendations. Nevertheless, 
some keen scientists hope to solve this long-standing problem by means of an it- 
eration scheme ||, |], || [7[ §, |H| [11]: If the driver reaction was known from 



experiments 0, [IS], 0, [13} 0, |T7|, |Ig 0, [22| , the resulting traffic situation 
could be calculated, yielding improved route choice recommendations, etc. Given 
this iteration scheme converges, it would facilitate optimal recommendations and 
reliable traffic forecasts anticipating the driver reactions. Based on empirically de- 
termined transition and compliance probabilities, we will develop a new procedure 
in the following, which would even allow us to reach the optimal traffic distribution 
in one single step and in harmony with the forecast. 



Figure 1: Schematic illustration of a day-to-day route choice scenario. Each day, the 
drivers have to decide between two alternative routes, 1 and 2. Note that, due to 
the different number of lanes, route 1 has a higher capacity than route 2. The latter 
is, therefore, used by less cars. 

The solution of this difficult and practically relevant problem requires several con- 
cepts and methods from physics. First of all, we will identify the essential role and 
reason of intermittency in scenarios with repeated decisions (see Sec. |3J). Second, we 
will derive a non-linear feedback mechanism for intermittency control (see Sec. 
In addition, we will develop a stochastic description of the decision behavior (see 
Sec. and evaluate the corresponding transition and compliance probabilities in- 
cluding their time- dependence (see Sec. ||). 



2 Experimental setup and previous results 



To determine the route choice behavior, Schreckenberg, Selten et al. |B| have re- 
cently carried out a decision experiment (see Fig. |2]). N test persons had to 
repeatedly decide between two alternatives 1 and 2 (the routes) and should try to 
maximize their resulting payoffs (describing something like the speeds or inverse 
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Figure 2: Schematic illustration of the decision experiment. Several test persons 
have to take decisions based on the aggregate information their computer displays. 
The computers are connected and can, therefore, exchange information. However, a 
direct communication among players is suppressed. 



travel times). To reflect the competition for a limited resource (the road capacity), 
the received payoffs 

Pi(m) = P? - P^nt and P 2 {n 2 ) = P 2 ° - P\n 2 (1) 

went down with the numbers of test persons n\ and n 2 = N — n\ deciding for alter- 
natives 1 and 2, respectively. The user equilibrium corresponding to equal payoffs 
for both alternative decisions is found for a fraction 

„ pi 1 p0_ pO 

Jl N ~ Pl + P} N Pi + P 2 X 1 ' 

of persons choosing alternative 1. The system optimum corresponds to the maximum 
of the total payoff niPi(n{) + n 2 P 2 (n 2 ), which lies by an amount of 



1 P?-P 



o 

(3) 



2N Pi + Pi 

below the user optimum. Therefore, only experiments with a few players allow to find 
out, whether the test persons adapt to the user or the system optimum. Small groups 
are also more suitable for the experimental investigation of the fluctuations in the 
system and of the long-term adaptation behavior. Schreckenberg, Selten et al. found 
that, on average, the test groups adapted relatively well to the user equilibrium. 
However, although it appears reasonable to stick to the same decision once the 
equilibrium is reached, the standard deviation stayed at a finite level. This was not 
only observed in "treatment" 1, where all players knew only their own (previously 
experienced) payoff, but also in treatment 2, where the payoffs Pi(ni) and P 2 (n 2 ) for 
both, 1- and 2-decisions, were transmitted to all players (analogous to radio news). 
Nevertheless, treatment 2 could decrease the changing rate and increase the average 



payoffs. For details regarding the statistical analysis see Ref. 



To explain the mysterious persistence in the changing behavior and explore possi- 
bilities to suppress it, we have repeated these experiments with more iterations and 
tested additional treatments. In the beginning, all treatments were consecutively 
applied to the same players in order to determine the response to different kinds of 
information (see Fig. §). Afterwards, single treatments and variants of them have 
been repeatedly tested with different players to check our conclusions. Apart from 
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Figure 3: Overview of treatments 1 to 5 (with N = 9 and payoff parameters P° = 28, 
Pi = 4, P\ = 6, and = 34 for < t < 1500, but a zick-zack-like variation 
between P? = 44 and P? = -6 with a period of 50 for 1501 < t < 2500): (a) 
Average number of decisions for alternative 1 (solid line) compared to the user 
equilibrium (dashed line), (b) standard deviation of the number of 1-decisions from 
the user equilibrium, (c) number of decision changes from one iteration to the next 
one, (d) average payoff per iteration for players who have changed their decision 
and for all players. The latter increased with a reduction in the changing rate, but 
normally stayed below the payoff in the user equilibrium (which is 1 on average in 
treatments 4 and 5, otherwise 10). The displayed moving time-averages [(a) over 40 
iterations, (b)-(d) over 100 iterations] illustrate the systematic response to changes 
in the treatment every 500 iterations. Dashed lines in (b)-(d) show estimates of the 
stationary values after the transient period (to guide the eyes), while time periods 
around the dotted lines are not significant. Compared to treatment 1, treatment 
3 managed to reduce the changing rate and to increase the average payoffs (three 
times more than treatment 2 did). These changes were systematic for all players 
(see Fig. f|). In treatment 4, the changing rate and the standard deviation went up, 
since the user equilibrium changed in time. The user-specific recommendations in 
treatment 5 could almost fully compensate for this. The above conclusions are also 
supported by additional experiments with single treatments. 
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this, we have generalized the experimental setup in the sense that it was not any- 
more restricted to route choice decisions: The test persons did not have any idea 
of the payoff functions in the beginning, but had to develop their own hypothesis 
about them. In particular, the players did not know that the payoff decreased with 
the number of persons deciding for the same alternative. 

In treatment 3, every test person was informed about the own payoff P\{nf) [or 
P 2 (^2)] and the potential payoff 

P 2 (N - m + eJV) = P 2 {n 2 ) - tNP 2 l (4) 

[or P\(N —n 2 + eN) = P\{n\) — eNP±] he or she would have obtained, if a fraction e of 
persons had additionally chosen the other alternative (here: e — 1/N — 1/9). Treat- 
ments 4 and 5 were variants of treatment 3, but some payoff parameters were changed 
in time to simulate varying environmental conditions. In treatment 5, each player 
additionally received an individual recommendation which alternative to choose. 

The higher changing rate in treatment 1 compared to treatment 2 can be understood 
as effect of an exploration rate v\ required to find out which alternative performs 
better. It is also plausible that treatment 3 could further reduce the changing rate: 
In the user equilibrium with P\{n\) = P 2 {n 2 ), every player knew that he or she 
would not get the same, but a reduced payoff, if he or she would change the decision. 
That explains why the new treatment 3 could reach a great adaptation performance, 
reflected by a very low standard deviation and almost optimal average payoffs. The 
behavioral changes induced by the treatments were not only observed on average, 
but for every single individual (see Fig. Moreover, even the smallest individual 
cumulative payoff exceeded the highest one in treatment 1. Therefore, treatment 3's 
way of information presentation is much superior to the ones used today. 



3 Explaining the volatile decision dynamics 

In this section, we will investigate why players changed their decision in the user 
equilibrium at all. The reason for the pertaining changing behavior can be revealed 
by a more detailed analysis of the individual decisions in treatment 3. Figure ^ shows 
some kind of intermittent behavior, i.e. quiescent periods without changes, followed 
by turbulent periods with many changes. This is reminiscent of volatility clustering 
in stock market indices |27|, |28|, P9[ , where individuals also react to aggregate in- 
formation reflecting all decisions (the trading transactions). Single players seem to 
change their decision to reach above-average payoffs. In fact, although the cumula- 
tive individual payoff is anticorrelated with the average changing rate, some players 
receive higher payoffs with larger changing rates than others. They profit from the 
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Figure 4: Comparison of the individual decision behaviors under (a) treatment 1, 
(b) treatment 2, and (c) treatment 3. The upper values correspond to a decision for 
alternative 2, the lower ones for alternative 1. Note that some test persons showed 
similar behaviors (either more or less the same or almost opposite ones), although 
they could not talk to each other. This shows that there are some typical strategies 
how to react to specific information. The group has, in fact, to develop complemen- 
tary strategies in order to reach a good adaptation performance. Identical strategies 



would perform poorly (as in the minority game |p3| , |24| , |25| , f2q1). Despite the men- 
tioned complementary behavior, there is a characteristic reaction to changes in the 
treatment. For example, compared to treatment 2 all players reduce their changing 
rate in treatment 3. 
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Figure 5: Illustration of typical results for treatment 3 (which was here the only 
treatment applied to the test persons, in contrast to Fig. |^). (a) Decisions of all 9 
players. Players are displayed from the top to the bottom in the order of increasing 
changing rate. Although the ranking of the cumulative payoff and the changing rate 
are anticorrelated, the relation is not monotonic. Note that turbulent or volatile 
periods characterized by many decision changes are usually triggered by individual 
changes after quiescent periods (dotted lines), (b) The changing rate is mostly larger 
than the (standard) deviation from the user equilibrium n\ = /fW = 6, indicating 
an overreaction in the system. 
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overreaction in the system. Once the system is out of equilibrium, all players re- 
spond in one way or another. Typically, there are too many decision changes (see 
Figs. |)]and|6|). The corresponding overcompensation, which had also been predicted 
by computer simulations || ||, |9], [10], [I7|, [HJ, gives rise to "turbulent" periods. 



We should, however, note that the calm periods without decision changes tend to 
become longer in the course of time. That is, after a very long time period the 
individuals learn not to change their behavior when the user equilibrium is reached. 
This is not only found in Fig. ^j, but also visible in Fig. ^]c after about 800 iterations. 
In larger systems (with more participants) this transient period would take even 
longer, so that this stabilization effect cannot be observed in experiments with less 
iterations or more test persons. 
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Figure 6: Measured overreaction, i.e., difference between the actual number of de- 
cision changes (the changing rate) and the required one (the standard deviation). 
The overreaction can be significantly influenced by the treatment, i.e. the way of 
information presentation. The minimum overreaction was reached by treatment 5, 
i.e. user-specific recommendations. 

Finally, we should stress that other interpretations of the rather persistent decision 
changes have been ruled out, for example, an unstable user equilibrium or a compe- 
tition between the user optimum and the system optimum. High changing rates also 
occur if the user and system equilibrium agree, and if the payoff functions P\(ni) 
and ^2(^-2) are the same (i.e. fi q = 1/2). 



4 Decision and intermittency control by non- 
linear feedback based on guidance strategies 

To avoid overreaction, in treatment 5 we have recommended a number 
fi q (t + i)N — ni(t) of players to change their decision and the other ones to keep it. 
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These user-specific recommendations helped the players to reach the smallest over- 
reaction of all treatments (see Fig. ||) and a very low standard deviation, although 
the payoffs were changing in time (see Fig. ^j. Treatment 4 shows how the group 
performance was affected by the time- dependent user equilibrium: Even without 
recommendations, the group managed to adapt to the changing conditions surpris- 
ingly well, but the standard deviation and changing rate were approximately as high 
as in treatment 2 (see Fig. |3|). This adaptability (the collective "group intelligence") 
is based on complementary responses (direct and contrary ones JL9(| , "movers" and 
"stayers", cf. Fig. |4]). That is, if some players do not react to the changing conditions, 
others will take the chance to earn additional payoff. This experimentally supports 
the behavior assumed in the theory of efficient markets, but here the efficiency is 
limited by overreaction. 




Figure 7: Representative examples for (a) treatment 4 and (b) treatment 5. The 
displayed curves are moving time-averages over 20 iterations. Compared to treatment 
4, the user-specific recommendations in treatment 5 (assuming Cm — C$ = 1, 
R x = 0, R 2 = max([/ 1 eq (t + l)N - m(t) + B(t + l)]/n 2 (t), 1), h = I 2 = 1) could 
increase the group adaptability to the user equilibrium a lot, even if they had a 
systematic or random bias B (see Fig. |]a). The standard deviation was reduced 
considerably and the changing rate even more. 
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In most experiments, we found a constant and high compliance Cs(t) ~ 0.92 
with recommendations to stay, but the compliance Cm(£) with recommendations 
to change (to 'move') [15, [16], |31, [32| turned out to vary in time. It decreased with 
the reliability of the recommendations (see Fig. |^), which again dropped with the 
compliance. 
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Figure 8: (a) In treatment 5, the compliance to recommendations to change dropped 
considerably below the compliance to recommendations to stay. The compliance to 
changing recommendations was very sensitive to the degree of their reliability, i.e. 
participants followed recommendations just as much as they helped them to reach 
the user equilibrium (so that the bias B did not affect the small deviation from it, 
see Fig. [7]b). While during time interval A, the recommendations would have been 
perfect, if all players had followed them, in time interval B the user equilibrium was 
overestimated by B = +1, in C it was underestimated by B = —2, in D it was 
randomly over- or underestimated by B = ±1, and in E by B = ±2. Obviously, a 
random error is more serious than a systematic one of the same amplitude. Dotted 
non- vertical lines illustrate the estimated compliance levels during the transient pe- 
riods and afterwards (horizontal dotted lines), (b) The average payoffs varied largely 
with the decision behavior. Players who changed their decision got significantly lower 
payoffs on average than those who kept their previous decision. Even recommenda- 
tions could not overcome this difference: It stayed profitable not to change, although 
it was generally better to follow recommendations than to refuse them. For illustra- 
tive reasons, the third and fourth line were shifted by 15, while the fifth and sixth 
line were shifted by 30 iterations. 

Based on this knowledge, we have developed a model, how the competition for 
limited resources (such as road capacity) could be optimally guided by means of 
information services. Let us assume we had ni(t) 1-decisions at time t, but the 
optimal number of 1-decision at time t + 1 is calculated to be fi q (t + 1)N > ni(t). 
Our aim is to balance the deviation /i q (t + 1)N — ni(t) > by the expected net 
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number 

(Am(t + 1)) = (m(* + 1) - ni(t)) = (m(t + 1)) - m(t) (5) 

of transitions from decision 2 to decision 1, i.e. /i q (t + 1)N — n\(t) = (Am(t + 1)). 
In the case fi q (t + — n\(t) < 0, indices 1 and 2 have to be interchanged. 

Let us assume we give recommendations to fractions Ii(t) and I 2 {t) of players who 
had chosen decision 1 and 2, respectively. The fraction of changing recommenda- 
tions to previous 1-choosers shall be denoted by Ri(t), and for previous 2-choosers 
by R 2 (t). Correspondingly, fractions of [1 — Ri(t)} and [1 — i? 2 (^)] receive a recom- 
mendation to stick to the previous decision. Moreover, [1 — Cjvf(t)] is the refusal 
probability of recommendations to change, while [1 — Cs{t)] is the refusal probability 
of recommendations to stay. Finally, we denote the spontaneous transition proba- 
bility from decision 1 to 2 by p a (2|l, ni, t) and the inverse transition probability by 
p tt (l|2, ni; t), in case a player does not receive any recommendation. This happens 
with probabilities [1 — h(t)} and [1 — hit)], respectively. Both transition probabili- 
ties p a (2\l,ni,t) and p a (l\2,ni,t) are functions of the number ni(t) = N — n 2 (t) of 
previous 1-decisions. The index a allows us to reflect different strategies or charac- 
ters of players. The fraction of players pursuing strategy a is then denoted by F a (t). 
Applying methods summarized in Ref. Ifffih the expected change (Ani(t + 1)) of n\ 



is given by the balance equation 

(Am(t+1)> = £p a (l|2,n i; t)F a (t)[l-/ 2 (t)]n 2 (t) 

a 

- ^p a (2|l,n 1 ;t)F a (t)[l-/ 1 (t)]n 1 (t) 

a 

+ J2{C a M (t)R 2 (t) + [1 - C a s (t)][l - R 2 (t)}}F a (t)h(t)n 2 (t) 

a 

^{cu^Ri^ + ii-cmm-MmFamit^it). (6) 

a 

Together with the requirement 

(An 1 (t + l)) = f?(t + l)N-ni(t), (7) 

this equation defines, with respect to the number n± of previous 1-decisions, a non- 
linear feedback or control strategy. 

Note that, for Eq. (|6]), it was not necessary to distinguish different characters a. We 
have, therefore, evaluated the overall transition probabilities 

p(l|2,m;t) = $>a(l|2,ni;t)F a (t) an d p(2|l,m;t) = 5> a (2|l,m;t)F a (t) . 

a a 

(8) 

According to classical decision theories [^, [35|, ^6j, we would expect that 
the transition probabilities p a (l\2,ni,t) and p(l\2,ni,t) should be monotonically 
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increasing functions of the payoff Px(nx(t)), the payoff difference Px(nx(t)) — 
P 2 (N — ni(t)), the potential payoff Px(nx(t) + eN), or the potential payoff gain 
Pi{n>x(t) + eN) — P 2 (N — n\{t)). All these quantities vary linearly with m, so that 
p(l|2, Hi, t) should be a monotonic function of ni(t). A similar thing should apply to 
p(2\l , Tlx', t) . Instead, the experimental data point to transition probabilities with a 
minimum at the user equilibrium (see Fig. []a). That is, the players stick to a certain 
alternative for a longer time, when the system is close to the user equilibrium. This 
is a result of learning |J7| (see also Refs. [^, [59], [4_(], [41], [22]]). In fact, we find a grad- 
ual change of the transition probabilities in time (see Fig. |9]b). The corresponding 
"learning curves" reflect the players' adaptation to the user equilibrium. 
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Figure 9: Illustration of decision distributions P(i\ni) and transition probabilities 
p(i'\i, ni] t) measured in treatment 3. (a) The probability P(l\nx) to choose alter- 
native 1 was approximately 2/3, independently of the number rii of players who 
had previously chosen alternative 1. The probability P(2\rii) to choose alternative 
2, given that n\ players had chosen alternative 1, was always about 1/3. In contrast, 
the transition probability p(l|2, rii) describing decision changes from alternative 2 to 
1 did depend on the number rii of players who had chosen decision 1. The same was 
true for the inverse transition probability p(2\l,rii) from decision 1 to decision 2. 
Remarkably enough, these transition probabilities are not monotonically increasing 
with the payoff or the expected payoff gain, as they do not monotonically increase 
with Tlx- Instead, the probability to change the decision shows a minimum at the user 
equilibrium rii = fi^N = 6. (b) The reason for the different transition probabilities 
is an adaptation process in which the participants learn to take fewer changing de- 
cisions, when the user equilibrium is reached or close by, but more, when the user 
equilibrium is far away. (The curves were exponentially smoothed with a = 0.05.) 

After the experimental determination of the transition probabilities p(2\l,nx]t), 
p(l\2,nx]t) and specification of the overall compliance probabilities 

C M (t) =Y,C a M (t)F a (t) , C s {t) = £C£(t)F a (t) , (9) 
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we can guide the decision behavior in the system via the levels Ii(t) of information 
dissemination and the fractions Ri(t) of recommendations to change (i G {1,2}). 
These four degrees of freedom allow us to apply a variety of guidance strategies 
depending on the respective information medium. For example, a guidance by radio 
news is limited by the fact that Ii(t) = hit) is given by the average percentage of 
radio users. Therefore, equations (|5|) and (0) cannot always be solved by variation of 
the fractions of changing recommendations Ri(t). User-specific services have much 
higher guidance potentials and could, for example, be transmitted via SMS. Among 
the different guidance strategies fulfilling equations @ and (|7|), the one with the 
minimal statistical variance will be the best. However, it would already improve the 
present situation to inform everyone about the fractions Ri(t) of participants who 
should change their decision, as users can learn to respond with varying probabilities 
(see Fig. |). 

The outlined guidance strategy could, of course, also be applied to reach the system 
optimum rather than the user optimum. The values of Ani(t + 1) would just be 
different. Note, however, that the users would soon recognize that this guidance is 
not suitable to reach the user optimum. Consequently, the compliance probabilities 
Cj(t) with j G {M, S} would gradually go down, which would affect the potentials 
and reliability of the guidance system. This problem can only be solved by a suit- 
able modification of the payoff functions, adapting the user optimum to the system 
optimum. 

In practical applications, we would determine the time-dependent compliance proba- 
bilities Cj (t) (and the transition probabilities) on-line with an exponential smoothing 
procedure according to 

Cj(t+ 1) = aC'^t) + (1 -ct)Cj(t) with a » 0.1 , (10) 

where Cj(t) is the percentage of participants who have followed their recommenda- 
tion at time t. As the average payoff for decision changes is normally lower than for 
staying with the previous decision (see Figs. [8}} and |3]d), a high compliance prob- 
ability Cm is hard to achieve. That is, individuals who follow recommendations to 
change normally pay for reaching the user equilibrium (because of the overreaction 
in the system). Hence, there are no good preconditions to charge the players for 
recommendations, as we did in another treatment. Consequently, only a few play- 
ers requested recommendations, which reduced their reliability, so that the overall 
performance of the system went down. 
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5 Master equation description of iterated deci- 
sions 



The stochastic description of decisions that are taken at discrete time steps (e.g. on 
a day-to-day basis) is possible by means of the time-discrete master equation 

P(n, t + At)=J2 P(n, t + At\n', t)P(n', t) (11) 

with At = 1 unit time. Herein, P(n, t) denotes the occurence probability of the 
configuration n = (ni,n 2 ) at time t. This vector comprises the occupation numbers 
ni and reflects the decision distribution in the system. As the number of individuals 
changing to the other alternative is given by a binomial distribution, we obtain the 
following expression for the configurational transition probability: 



P((ni,n 2 ),t + 1 1 (m — Ani,n 2 + Am), t) 

n 2 + An } 
Ani + k 



min(ni-Ani,n2) / _i_ A \ 

E a« )p(M^n 1 -An 1 ;t) A ^ k [l-p(l\2,n l -An 1 ;t)r- k 



k=0 

X 



( ni k Ani ^^l^m-Anx^^fl-^l^m-Am;*)]" 1 -^ 1 ^. (12) 

This formula sums up the probabilities that Ani + k of n 2 + Ani previous 2-choosers 
change independently to alternative 1 with probability p(l\2,ni — Ani,t), while 
k of the ni — Ani previous 1-choosers change to alternative 2 with probability 
p(2|l,ni — Ani,t), so that the net number of changes is Ani. If Ani < 0, the roles 
of alternatives 1 and 2 have to be interchanged. Formulas (|TTD and (|T2|) would look 
even more compicated, if we distinguished several characters a. We would, then, 
have to replace the binomial distributions by multinomial ones. 



The potential use of Eq. fll2|) is the calculation of the statistical variation of the 
decision distribution or, equivalently, the number ni of 1-choosers. It also allows one 
to determine the variance, which the optimal guidance strategy should minimize in 
favour of reliable recommendations. 



6 Summary and Outlook 

In this contribution, we have discovered that the dynamics of iterated decisions 
based on aggregate information is intermittent. In order to control intermittency, 
we have developed a stochastic description and a non-linear feedback mechanism. 
That is, the application of several physical concepts and methods allowed us to 
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gain a detailled understanding of decision dynamics, which is required for practical 
applications. 

In more detail, we have explored different and identified superior ways of informa- 
tion presentation that facilitate to guide user decisions in the spirit of higher payoffs. 
By far the least standard deviations from the user equilibrium could be reached by 
presenting the own payoff and the potential payoff, if the respective participant (or 
a certain fraction of players) had additionally chosen the other alternative. Interest- 
ingly, the decision dynamics was found to be intermittent similar to the volatility 
clustering in stock markets, where individuals also react to aggregate information. 
This results from the desire to reach above-average payoffs, combined with the im- 
manent overreaction in the system. We have also demonstrated that payoff losses 
due to a volatile decision dynamics (e.g., excess travel times) can be reduced via 
user-specific recommendations by a factor of three or more. Such kinds of results 
will be applied to the route guidance on German highways (see, for example, the 
project SURVIVE conducted by Nobel prize winner Reinhard Selten and Michael 
Schreckenberg). Optimal recommendations to reach the user equilibrium follow di- 
rectly from the derived balance equations (^) and (|7|) for decision changes based on 
empirical transition and compliance probabilities. The quantification of the tran- 
sition probabilities needs a novel stochastic description of the decision behavior, 
which is not just driven by the potential (gains in) payoffs, in contrast to intuition 
and established models. To understand these findings, one has to take into account 
individual learning. 

Obviously, it requires both, theoretical and experimental efforts to get ahead in de- 
cision theory. In a decade from now, the theory of "elementary" human interactions 
will probably have been developed to a degree that allows one to systematically 
derive social patterns and economic dynamics on this ground in a similar way as 
the structure, properties, and dynamics of matter have been derived from elemen- 
tary physical interactions. This will not only yield a deeper understanding of socio- 
economic systems, but also help to more efficiently distribute scarce resources such 
as road capacities, time, space, money, energy, goods, or our natural environment. 
One day, similar guidance strategies as the ones suggested above may help politi- 
cians and managers to stabilize economic markets, to increase average and individual 
profits, and to decrease the unemployment rate. Physics can contribute to this goal, 
in particular with the methods developed in the fields of non-linear dynamics and 
statistical physics. 
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